Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus

نویسندگان

چکیده

Low-resource text-to-speech synthesis is a very promising research direction. Mongolian the official language of Inner Mongolia Autonomous Region and spoken by more than 10 million people worldwide. Mongolian, as representative low-resource language, has relative lack open-source datasets for its TTS. Therefore, we make public an multi-speaker TTS dataset, named MnTTS2, related researchers. In this work, invited three announcers to record topic-rich speeches. Each announcer recorded h speech, whole dataset was 30 in total. addition, built two baseline systems based on state-of-the-art neural architectures, including Fastspeech 2 model with HiFi-GAN vocoder full end-to-end VITS multi-speakers. On system FastSpeech2+HiFi-GAN, speakers scored 4.0 or higher both naturalness evaluation speaker similarity. achieved scores 4.5 similarity scores. The experimental results show that published MnTTS2 can be used build robust models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus building for Mongolian language

This paper presents an ongoing research aimed to build the first corpus, 5 million words, for Mongolian language by focusing on annotating and tagging corpus texts according to TEI XML (McQueen, 2004) format. Also, a tool, MCBuilder, which provides support for flexibly and manually annotating and manipulating the corpus texts with XML structure, is presented.

متن کامل

a new type-ii fuzzy logic based controller for non-linear dynamical systems with application to 3-psp parallel robot

abstract type-ii fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. type-ii fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when the noise (as an important instance of uncertainty) emerges. during the design of type- i fuz...

15 صفحه اول

Improving TTS with Corpus-Specific Pronunciation Adaptation

Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem, the present work aims at adapting automaticall...

متن کامل

Corpus-Based Unit Selection TTS for Hungarian

This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-tospeech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound bound...

متن کامل

Part of Speech Tagging for Mongolian Corpus

This paper introduces the current result of a research work which aims to build a 5 million tagged word corpus for Mongolian. Currently, around 1 million words have been automatically tagged by developing a POS tagset and a bigram POS tagger.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13074237